English to Urdu Statistical Machine Translation: Establishing a Baseline
نویسندگان
چکیده
The aim of this paper is to categorize and present the existence of resources for Englishto-Urdu machine translation (MT) and to establish an empirical baseline for this task. By doing so, we hope to set up a common ground for MT research with Urdu to allow for a congruent progress in this field. We build baseline phrase-based MT (PBMT) and hierarchical MT systems and report the results on 3 official independent test sets. On all test sets, hierarchial MT significantly outperformed PBMT. The highest single-reference BLEU score is achieved by the hierarchical system and reaches 21.58% but this figure depends on the randomly selected test set. Our manual evaluation of 175 sentences suggests that in 45% of sentences, the hierarchical MT is ranked better than the PBMT output compared to 21% of sentences where PBMT wins, the rest being equal.
منابع مشابه
Model for English-Urdu Statistical Machine Translation
There are above 60 million first language speakers of Urdu and above 104 million second language speakers. Lot of knowledge on the internet available/useful to these speakers of Urdu is in English. The contrast in typology of both languages is interesting to study for Statistical Machine Translation. However, there is almost no parallel aligned data available freely for the selected language pa...
متن کاملImproving Machine Translation via Triangulation and Transliteration
In this paper we improve Urdu→Hindi English machine translation through triangulation and transliteration. First we built an Urdu→Hindi SMT system by inducing triangulated and transliterated phrase-tables from Urdu–English and Hindi–English phrase translation models. We then use it to translate the Urdu part of the Urdu-English parallel data into Hindi, thus creating an artificial Hindi-English...
متن کاملWord-Order Issues in English-to-Urdu Statistical Machine Translation
We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experi...
متن کاملDevelopment of Parallel Corpus and English to Urdu Statistical Machine Translation
In this paper we share the efforts for development of a parallel corpus for statistical machine translation for English text into Urdu. There are certain issues faced during this effort, which are shared and discussed.
متن کاملSemantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach
We describe a unified and coherent syntactic framework for supporting a semanticallyinformed syntactic approach to statistical machine translation. Semantically enriched syntactic tags assigned to the target-language training texts improved translation quality. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet repor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014